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1. 1.  Executive  Summary 

The  goal  of  our  research  effort  is  to  study  the  use  of  smart  pixel  technology  for  image 
processing  applications.  The  effort  consists  of  five  inter-related  projects  in  the  following  areas: 

1 .  Architecture  and  system  design  studies  for  image  processing  using  smart  pixels. 

2.  Comparison  of  smart  pixel  approach  with  competing  electronic  solutions. 

3.  Design,  fabrication  and  test  of  smart  pixel  chipsets  for  image  computing. 

4.  Collaboration  with  system  groups  to  insert  our  chipsets  into  system  prototypes. 

5.  Collaboration  with  a  leading  supplier  of  integrated  circuit  (IC)  design  tools  to  adapt  their 
software  for  design  of  smart  pixel  ICs. 

1.2.  Accomplishments!. lew  Findings 

Photonic  FFT  Processor  Architecture  -  We  have  designed  a  high-performance  photonic 
chipset  for  computing  1-D  complex  fast-Fourier  transforms  (FFT).  The  Fast-Fourier  Transform 
(FFT)  is  an  important  operation  for  many  applications  such  as  image  processing,  high-speed 
control,  and  instrumentation.  Our  design  is  based  on  the  hybrid  CMOS-SEED  technology.  The 
performance  benchmarks  show  our  design  to  be  the  significantly  faster  than  current  electronic 
implementations.  Specifically,  we  can  compute  a  new  1,024-point  complex  FFT  in  every  0.44 
psec  using  a  fully  pipelined  system  with  21  OEIC  chips.  A  high-performance  electronic  system 
that  uses  4  Sharp  LH9124  FFT  processor  chips,  12  Sharp  LH9320  address  generator  chips,  12 
SRAM  chips,  and  various  glue-logic  chips  requires  3 1  psec  for  the  same  computation. 

Fine-Grain  Multiprocessor  OEIC  -  We  have  designed  and  fabricated  a  hybrid  CMOS-SEED 
IC  that  integrates  512  bit-serial  processors.  Our  design  is  targeted  toward  image  processing 
applications  such  as  template  matching  that  require  large  number  of  bit -serial  operations. 

The  chip  integrates  approximately  400,000  MOS  transistors  and  4,096  optical  I/O.  Each 
processor  contains  a  Logic-Only  Unit  (LOU),  100  bits  of  RAM,  and  a  single  high-speed 
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register.  The  LOU  circuit  is  capable  of  computing  any  two-bit  logic  function  in  a  single  clock 
cycle.  At  100MHz  clock  speed,  this  chip  will  achieve  51.2  Billion  1-bit  operations  per  second. 
On  chip  electrical  interconnection  network  provides  near-neighbor  connections  between 
processors.  Global  interconnections  are  to  be  implemented  optically  using  free-space  optical 
interconnects.  This  device  has  been  fabricated  and  initial  optical  tests  at  Lucent  show  correct 
operation. 

Coarse-Grain  Multiprocessor  OEIC  -  We  have  designed  and  fabricated  a  64-bit 
microprocessor  core  IC  withl92  optical  I/O.  Our  design  is  targeted  toward  image  processing 
applications  requiring  bit-parallel  computations.  This  3.5mm2  hybrid  CMOS-SEED  IC  was 
electrically  tested  at  100MHz  with  a  performance  of  100  million  64-bit  instructions  per  second 
(MIPS).  The  processor  design  includes  a  64-bit  arithmetic  logic  unit  (ALU)  which  implements 
16  logic  and  32  arithmetic  functions.  A  1cm2  chip  can  integrate  thirty  two  64-bit  processors  and 
achieve  3,200  64-bit  MIPs.  Further  performance  improvements  can  be  achieved  using  a  0.35 
micron  CMOS  process.  When  combined  with  an  appropriate  photonic  page  buffer  IC  operating 
as  cache  memory,  it  becomes  possible  to  build  a  compact,  two-chip  parallel  processor  system. 
We  have  optically  tested  this  chipset  at  Optivision  to  verify  correct  operation  at  low  speed. 

Multiprocessor  Switch  OEIC  -  We  have  collaborated  with  Lucent  Technologies  to  design, 
fabricate  and  test  a  16-channel,  16-bits/channel  self-routing  crossbar  OEIC  for  multiprocessor 
switching.  This  chip  integrates  120,000  MOS  transistors  with  an  array  of  4,096  optical  devices. 
The  device  has  been  optically  tested  at  Lucent  operating  at  50MHz  clock  rate.  A  novel 
approach  taken  in  this  project  by  Lucent  is  to  use  WDM  technology  to  wavelength-multiplex 
16-bits  of  each  channnel  onto  a  single  fiber. 

Photonic  Page  Buffer  Chipsets  -  We  have  designed,  fabricated  and  tested  6  OEICs  for 
optical  memory  applications.  Our  chip  technology,  called  the  hybrid  CMOS-SEED,  is  based  on 
flip-chip  integration  of  submicron  CMOS  ICs  with  GaAs  chips  containing  2-D  arrays  of 
multiple-quantum  wells  (MQW)  diode  optical  receivers  and  transmitters.  The  largest  OEIC  was 
designed  jointly  with  Lucent  Technologies  and  it  integrates  50Kbits  of  static  RAM 
(approximately  400,000  MOS  transistors)  with  4,096  optical  devices.  The  memory  is  organized 
as  a  512-bit  wide  and  100-bit  deep  random  access  memory  with  512  logic-only  units  (LOUs) 
processors.  The  LOUs  allow  this  device  to  perform  high-speed  image  and  data  processing 
algorithms.  This  device  has  been  fabricated  and  initial  tests  show  correct  operation.  Earlier,  we 
have  collaborated  with  Lucent  Technologies  to  demonstrate  a  2Kbit  (21,000  MOS  transistors) 
photonic  page  buffer  IC.  The  64  optical  I/O  channels  on  this  IC  were  tested  at  50Mbps/channel 
optical  data  throughput.  We  have  successfully  applied  our  work  to  a  DARPA  funded  program 
at  Optivision.  There  we  have  designed  a  32-channel,  128-bit  photonic  page  buffer  OEIC  that 
was  tested  by  Optivision  for  parallel  optical  operation  at  277MHz. 

High-Performance  Test  Fixture  for  OEICs  -  We  have  built  a  compact,  84  I/O  channel,  100 
Mbps/channel  test  fixture  for  testing  and  control  of  optoelectronic  integrated  circuits  (OEICs). 
We  have  used  this  fixture  to  test  over  a  dozen  OEIC  that  were  designed  under  this  program. 
We  have  successfully  transitioned  this  work  to  Hewlett-Packard  into  a  DARPA  funded 
program  on  parallel  optical  links  (Darpa  Polo  Project,  Darpa  PM:  Anis  Hussain).  Our  fixture  is 
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also  being  used  at  Lucent  Technologies  and  we  are  in  the  process  of  delivering  a  version  to 

UCSD  (Dr.  Shaya  Fainman). 

1.3.  Personnel  Supported 

Richard  Rozier  -  Ph.D.  student  in  the  department  of  Electrical  Engineering  at  UNCC.  Mr. 

Rozier  is  a  United  States  citizen.  He  is  funded  from  a  related  AASERT  (F49620-95- 1-0425). 

James  Rorie  -  Ph.D.  student  in  the  department  of  Electrical  Engineering  at  UNCC.  Mr.  Rorie 

is  a  United  States  citizen.  He  is  funded  from  a  related  AASERT  (F4962G-95-1-0425). 

Jason  Lambirth  -  M.S.E.E.  student  in  the  department  of  Electrical  Engineering  at  UNCC.  Mr. 

Lambirth  is  a  United  States  citizen. 

Dr.  Fouad  Kiamilev  -  Asst.  Professor  of  Electrical  Engineering  at  UNCC.  Dr.  Kiamilev  is  a 

research  advisor  for  all  of  the  above  students. 

Dr.  Ashok  Krishnamoorthy  -  Member  of  Technical  Staff,  AT&T  Bell  Laboratories.  Dr. 
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1.5.  Chipset  Development 

Free-space  optical  interconnection  (FSOI)  of  integrated  circuits,  called  smart  pixels, 
shows  great  potential  for  efficient  implementation  of  high-performance  parallel  computing 
systems  [1].  The  use  of  FSOI  technology  to  build  massively  parallel  processors  (MPPs)  has 
been  previously  proposed  [2,3].  In  this  approach,  large  numbers  of  simple,  bit-serial  processors 
are  integrated  on  a  single  chip  and  interconnected  using  FSOI.  While  this  architecture  is 
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efficient  for  bit-oriented  logic  calculations,  it  requires  multiple-clock  cycles  to  complete 
arithmetic  and  logic  operations  that  involve  multiple  bits.  An  alternative  approach  is  to  use  a 
more  sophisticated  processor  design  capable  of  operating  on  multiple -bits  in  a  single  clock- 
cycle.  While  fewer  such  processors  can  be  integrated  on  a  single  chip,  the  aggregate 
performance  of  this  approach  can  potentially  exceed  the  performance  of  the  bit-serial  approach 
for  certain  computations.  The  purpose  of  this  study  is  to  investigate  the  design  of  such  a  bit- 
parallel  processor  using  smart  pixel  technology. 

In  this  paper,  we  present  the  design  of  a  hybrid  CMOS-SEED  64-bit  microprocessor 
core  integrated  circuit  (IC).  This  3.5mm2  IC  was  fabricated  in  0.8  micron  HP26G  CMOS 
technology  [4]  and  integrates  approximately  12,000  MOS  transistors.  Functionally,  the  design 
integrates  a  64-bit  electrically  scanable  register  (ESR)  and  a  64-bit  arithmetic-logic  unit  (ALU) 
that  provides  a  full  range  of  boolean  and  arithmetic  functions.  The  chip  contains  128  optical 
receiver  circuits  and  64  optical  transmitter  circuits,  and  was  electrically  tested  at  1 00MHz  with 
a  performance  of  100  million  64-bit  instructions  per  second  (MIPS).  The  processor  is  designed 
to  optically  input  two  64-bit  words  and  optically  output  one  64-bit  word  on  every  clock  cycle. 
The  clock  and  control  signals  are  supplied  to  the  chip  electrically.  Figure  1  shows  the  chip 
layout. 


Figure  1.  Layout  of  the  64-bit  microprocessor  core  IC  for  hybrid  CMOS-SEED  technology. 

Our  chip  was  designed  to  mate  with  a  proven  GaAs  diode  array  based  on  the  hybrid 
CMOS-SEED  technology  [5]  that  enables  GaAs  MQW  photodetectors  and  modulators  to  be 
flip-chip  bonded  to  commodity  CMOS  VLSI  processes.  It  was  fabricated  as  part  of  the  1995 
AT&T-ARPA  hybrid  CMOS-SEED  multi-project  fabrication  run  [6].  At  the  time  this  paper 
was  written,  only  the  electrical  version  of  the  chip  was  available  for  testing.  The  receivers  and 
transmitter  circuits  used  on  the  chip  have  previously  been  demonstrated  [7];  the  active  area  of 
the  processor  core  (e.g.  circuit  area  with  optical  I/Os  but  without  the  electrical  padffame)  is 
approximately  2mm2.  A  key  element  of  our  chip  design  is  the  integration  of  the  optical  I/O 
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padframe  directly  over  active  digital  CMOS  circuitry  that  has  been  demonstrated  earlier  [8]. 
Without  this  layout  feature,  the  area  of  the  processor  core  would  be  at  least  twice  as  large. 

With  our  present  design,  a  1cm2  hybrid  CMOS-SEED  chip  can  easily  integrate  thirty 
two  (32)  64-bit  processors  that  perform  3,200  64-bit  MIPS  when  operating  with  a  100MHz 
clock.  To  achieve  the  same  performance  with  a  bit-serial  approach  would  require  2,048  bit- 
serial  processors  (64  x  32),  also  operating  at  100MHz.  Both  chip  designs  contain  4,096  optical 
receiver  circuits  and  2,048  optical  transmitter  circuits.  However,  the  bit-parallel  approach  is 
more  efficient  for  computations  that  involve  64-bit  arithmetic  operations  and  do  not  have 
enough  parallelism  to  occupy  all  2,048  bit-serial  processors.  For  example,  consider  the  problem 
of  adding  two  vectors,  each  vector  containing  32  64-bit  numbers.  This  operation  takes  up  one 
clock  cycle  using  the  chip  with  64-bit  processors.  On  the  other  hand,  64  clock  cycles  are 
required  for  the  same  computation  using  the  chip  with  bit-serial  processors.  In  this  case,  only 
32  processors  out  of  2,048  are  utilized  in  the  calculation.  While  there  are  specific  applications 
that  can  benefit  from  either  the  bit-serial  or  the  bit-parallel  architectures,  there  are  additional 
considerations  in  favor  of  the  bit -parallel  scheme,  including: 

1)  Better  software  compatibility  with  current  commercial  microprocessor  architectures  that  use 
32-bit  or  64-bit  word  sizes. 

2)  Multiple-bit  operations  can  be  implemented  with  minimum  latency.  For  example,  a  64-bit 
addition  operation,  can  be  calculated  in  one  (1)  clock  cycle  using  a  fast  carry- look-ahead 
adder  circuit  [9].  In  contrast,  a  bit-serial  adder  requires  sixty  four  (64)  clock  cycles  to 
perform  the  same  computation. 

3)  With  fewer  processors,  scheduling  and  distribution  of  the  work  load  among  the  processors 
becomes  a  simpler  task. 

Since  our  design  takes  advantage  of  existing  electronic  circuits,  it  is  highly  scaleable. 
With  0.5-micron  HP14TB  CMOS  technology  [10],  the  area  of  the  64-bit  ALU  can  be  reduced 
by  a  factor  of  3.  This  savings  in  chip  area  can  be  used  to  implement  new  processor  instructions, 
add  additional  registers,  expand  the  processor  bit-parallelism,  or  increase  the  number  of 
processors  on  the  chip.  It  should  be  noted  that  the  number  of  on-chip  optical  I/O  channels  is 
dependent  on  the  power  consumption  of  receiver/transmitter  circuits.  With  current  hybrid 
CMOS-SEED  technology,  200-2000  optical  I/O  channels  can  be  achieved  with  power 
consumption  of  3-5  mw  per  channel  operating  at  50-100  megabits  per  second  (Mbps)/channel. 
One  way  to  reduce  the  number  of  optical  I/O  channels  is  to  operate  the  optical  links  at  higher 
speed  than  the  silicon  circuitry.  In  this  approach,  multiple  bits  are  multiplexed  and  transmitted 
over  the  same  optical  link,  thereby  reducing  the  on-chip  power  consumption  associated  with 
optical  I/Os. 

The  development  of  a  high-speed  and  high-capacity  memory  subsystem  for  our  parallel 
processor  design  is  an  important  issue.  For  example,  the  1cm2  chip  design  with  fifty  64-bit 
processors  requires  a  multi-port  memory  system  capable  of  reading  128  words  and  writing  64 
words  on  every  clock  cycle,  with  each  word  having  64-bits.  The  methodology  for  designing 
smart  pixel  integrated  circuits  to  implement  such  a  memory  subsystem  is  described  in  [1 1].  For 
example,  figure  2  shows  a  16  kilobit  photonic  page  buffer  IC  with  random  page  access 
capability  that  was  designed,  fabricated  and  electrically  tested  at  100MHz  [12].  This  21mm2  IC 
was  fabricated  in  0.8  micron  HP26G  CMOS  technology  and  integrates  approximately  200,000 
transistors.  With  future  0.18  micron  CMOS  technology  [13],  the  memory  capacity  can  be 
increased  to  256  megabits  with  a  total  throughput  of  10  to  100  Gbps  and  10ns  access  time. 
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With  the  above  approach,  a  complete  hybrid  CMOS-SEED  parallel  processor  can  be 
built  using  just  two  chips  that  are  interconnected  using  FSOI.  This  design  integrates  thirty  two 
64-bit  processors  with  a  high-speed  memory  subsystem  and  achieves  3,200  64-bit  MIPS  when 
operating  with  a  100MHz  clock.  Circuit  optimization  techniques,  such  as  ALU  pipelining  and 
memory  interleaving,  can  increase  the  clock  speed  by  a  factor  of  two  to  four.  Further 
performance  and/or  functionality  gains  can  be  achieved  using  a  more  advanced  CMOS  process. 
Using  additional  processor  and  memory  chips,  we  can  develop  systems  with  even  higher 
performance,  albeit  at  greater  cost  and  increased  complexity. 

The  remainder  of  this  paper  focuses  on  the  design,  layout  and  electrical  testing  of  the 
64-bit  microprocessor  core  that  we  have  developed.  Section  2  describes  the  processor 
architecture.  In  section  3,  the  layout  of  the  processor  chip  is  examined.  The  results  of  electrical 
testing  are  detailed  in  section  4.  Finally,  section  5  wraps  up  with  some  conclusions. 


Figure  2.  Layout  of  a  32  page,  504  bits/page  photonic  page  buffer  IC.  This  type  of  IC  could  be  used  as  the  memory 
subsystem  for  our  64-bit  processor  design. 

1.6.  Chip  Architecture 

A  VLSI  microprocessor  is  specified  by  its  architecture  and  instruction  set.  The 
architecture  of  our  64-bit  microprocessor  core  is  shown  in  figure  3.  Due  to  chip  area 
constraints,  we  have  chosen  to  implement  a  small  subset  (or  core)  of  functionality  used  in 
modem  VLSI  microprocessors.  This  includes  a  64-bit  arithmetic-logic  unit  (ALU)  which 
implements  sixteen  (16)  boolean  and  thirty  two  (32)  fixed-point  arithmetic  operations,  a  64-bit 
electrically  scanable  register  (ESR)  which  provides  electrical  test  and  on-chip  data  storage 
capability,  and  several  64-bit  multiplexer  arrays  which  control  the  dataflow  throughout  the  chip. 
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The  ALU  circuit  has  two  64-bit  data  inputs  and  a  single  64-bit  data  output  (see  figure 
4).  Table  1  shows  the  boolean  and  arithmetic  operations  that  can  be  performed  by  the  ALU. 
Arithmetic  functions  propagate  the  carry  bit  from  the  least  significant  hit  (LSB)  to  the  most 
significant  bit  (MSB)  while  logic  functions  operate  on  all  64-bits  in  parallel.  To  ensure  high¬ 
speed  operation  for  arithmetic  functions,  the  ALU  circuit  employs  dedicated  parallel  carry  look¬ 
ahead  generation  circuitry.  Control  signals  for  the  ALU  are  input  from  electrical  pads.  The  ALU 
carry  and  overflow  signals  are  output  electrically.  The  two  64-bit  ALU  inputs  are  generated 
from  128  optical  pads.  The  64-bit  ALU  output  is  fed  into  a  64-bit  ESR  circuit. 


Figure  3.  Architecture  of  the  64-bit  microprocessor  core  design. 
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Figure  4.  Block  diagram  of  the  64-bit  ALU  circuit. 
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The  ESR  circuit  is  composed  of  a  64  element  array  of  edge -triggered  flip-flops  and  a  64 
element  array  of  three-to-one  (3-to-l)  multiplexers.  These  circuits  are  connected  to  implement 
a  64-bit  bi-directional  shift  register  with  serial  electrical  read/write  access  from  both  the  MSB 
and  the  LSB  sides  of  the  register.  The  major  purpose  of  the  ESR  circuit  is  to  simplify  electrical 
and  optical  testing  of  the  chip,  although  it  can  also  be  used  to  implement  bit-shift  and  bit -rotate 
instructions  during  normal  processor  operation.  Specifically,  each  of  the  64  optical  transmitters 
and  128  optical  receivers. are  individually  accessible  from  an  electrical  I/O  pad  using  the  ESR 
circuit  (this  access  is  provided  by  appropriately  shifting  the  data  through  the  ESR  and 
configuring  the  ALII  to/operate  in  the  pass-through  mode).  The  output  of  the  ESR  circuit 
drives  a  64  element  array  of  two-to-one  (2-to-l)  multiplexers.  This  multiplexer  array,  in  turn, 
drives  the  64  element  array  of  optical  MQW  modulators.  The  purpose  of  the  2-to-l  multiplexer 
array  is  to  permit  bypassing  of  the  ESR  circuit,  allowing  the  ALU  outputs  to  directly  drive  the 
optical  modulator  array. 


Figure  5.  Optimal  floorplan  for  the  64-bit  processor  using  the  datapath  layout  style. 

1.7.  Chip  Layout 

Currently,  the  most  popular  smart  pixel  chip  layout  scheme  is  to  design  a  self-contained 
“smart  pixel”  circuit  with  electronic  processing  circuitry,  optical  transmitter/receiver  circuitry, 
and  optical  I/O  devices.  This  circuit  is  then  replicated  in  a  two-dimensional  “smart  pixel”  array 
structure.  While  this  approach  is  highly  effective  for  simple  bit-serial  processors,  it  is  difficult  to 
use  it  for  the  layout  of  a  64-bit  microprocessor  core.  In  our  64-bit  processor  design,  most  of  the 
chip  area  is  taken  up  by  the  64-bit  ALU  and  the  64-bit  ESR  circuits.  Typically,  these  VLSI 
circuits  use  a  datapath  layout  style  that  creates  a  highly  regular  row  and  column  structure  as 
shown  in  figure  5.  The  datapath  layout  style  is  preferred  for  multiple -bit  processing  circuits 
because  it  achieves  the  most  uniform  timing  for  all  bits  in  a  word  and  because  it  minimizes 
routing  congestion  for  these  types  of  circuits.  However,  partitioning  a  datapath  circuit  to  fit  the 
conventional  “smart  pixel”  layout  would  quickly  become  an  impossible  task.  The  challenge  lies 
in  dividing  the  datapath  circuit  into  many  small  sub-circuits  that  are  replicated  in  a  two- 
dimensional  array  while  still  retaining  efficient  routing  and  uniform  timing  characteristics. 


Figure  6.  Microphotograph  showing  portion  of  a  CMOS-SEED  chip  with  optical  devices  integrated  directly  on  top  of 
active  CMOS  circuits.  See  reference  8  for  details. 
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Our  method  for  layout  of  the  64-bit  processor  smart  pixel  IC  is  to  integrate  the  optical 
devices  (e.g.  photodetectors  and  modulators)  directly  over  active  silicon  VLSI  circuits.  This 
layout  methodology  was  recently  proposed  and  experimentally  demonstrated  for  CMOS-SEED 
circuits  |8],  Figure  6  shows  a  microphotograph  from  reference  8  where  GaAs  quantum  well 
diodes  are  integrated  directly  on  top  of  a  two  kilobit  CMOS  FIFO  memory  circuit.  This  layout 
approach  enables  us  to  efficiently  combine  a  VLSI  datapath  layout  with  a  2-D  array  of  optical 
devices.  In  our  case,  the  64-bit  datapath  was  further  divided  into  two  32-bit  portions  to  achieve 
a  1.5mm  x  1.5mm  floorplan  as  shown  in  figme  7.  This  modification  was  necessary  in  order  to 
fit  the  64-bit  processor  design  within  the  allocated  2mm  x  2mm  chip  area.  The  128 
transimpedance  amplifiers,  that  convert  photodetector  current  into  digital  format,  are  arranged 
in  two  rows  of  64  columns.  The  64  optical  modulators  are  driven  directly  from  the  outputs  of 
the  64-bit  2-to-l  multiplexer  array.  Figure  8  shows  the  complete  layout  of  the  64-bit  processor 
chip  together  with  the  third-level  metal  routing  used  for  the  optical  I/O  padffame.  The  next 
paragraph  describes  the  optical  and  electrical  I/O  pad  frames  used  in  our  chip  design. 

The  optical  I/O  “pad  frame”  is  arranged  as  a  20  by  10  array  providing  a  total  of  200 
quantum  well  diodes  that  operate  as  MQW  modulators  or  PIN  photodetectors,  depending  on 
applied  bias  voltage.  The  electrical  I/O  and  power  pads  are  positioned  at  the  periphery  of  the 
chip.  Figure  8  shows  the  position  of  electrical  and  optical  pads  on  the  64-bit  processor  chip.  In 
this  figure,  a  vertical  separator  line  divides  sub-array  of  photodetectors  on  the  left  side  from  the 
smaller  sub-array  of  modulators  on  the  right  side.  The  labeling  scheme  for  optical  I/O  pads 
numbers  rows  from  top  to  bottom,  while  columns  are  lettered  from  left  to  right.  Since  the  64- 
bit  processor  design  uses  192  optical  I/O,  there  are  eight  (8)  unused  optical  I/O  pads. 
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Figure  7.  Floorplan  used  in  the  64-bit  processor  chip. 


1.8.  Chip  Testing 

At  the  time  this  paper  was  written,  only  the  electrical  version  of  the  chip  was  fabricated 
and  therefore  available  for  testing.  This  section  describes  functional  testing  that  was  undertaken 
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to  verify  correct  electrical  functionality  of  the  chip  and  the  test  fixture  that  was  used  to  perform 
the  testing. 

Functional  testing  was  performed  using  a  compact,  self-contained  test  fixture  that  uses 
an  FPGA  chip  to  supply  and  monitor  high-speed  electrical  signals  from  the  64-bit  processor 
chip.  The  test  program  resides  in  an  EPROM  chip  and  the  system  clock  is  derived  from  a 
100MHZ  (or  slower)  clock  generator  chip.  Physically,  the  test  fixture  resides  in  a  6”x9”  4-layer 
high-performance  printed  circuit  board  (PCB).  Typical  operation  of  the  test  fixture  requires 
several  steps.  First,  the  test  program  is  written  on  a  Sun  workstation  using  the  VHSIC  hardware 
description  language  (VHDL)  [2],  It  is  simulated  against  a  software  model  of  the  64-bit 
processor  chip  to  verify  correct  functionality.  Next,  the  test  program  is  synthesized  to  an  FPGA 
using  a  synthesis  CAD  tool  and  an  EPROM  bit-file  is  generated.  An  EPROM  programmer  is 
then  used  to  write  the  bit-file  into  an  EPROM  chip.  The  EPROM  chip  is  inserted  into  the  ZIF 
socket  on  the  PC  board.  Finally,  when  the  PC  board  is  powered  on,  the  test  program  is  loaded 
into  the  FPGA  chip  and  automatically  executed  supplying  test  vectors  to  the  64-bit  processor 
chip. 


Figure  8.  This  figure  shows  the  layout  of  the  64-bit  processor  chip  (top)  and  the  optical  I/O  padframe  routed  in  third- 
level  metal  (bottom)  directly  above  active  silicon  circuitry. 
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Figure  9.  Floorplan  for  the  64-bit  processor  chip  showing  the  electrical  I/O  pads  located  on  the  periphery  of  the  chip 
and  the  2-D  optical  I/O  padframe  (pads  A1  through  T10)  located  in  the  center  of  the  chip. 

The  first  test  performed  on  the  chip  was  an  electrical  continuity  check  between  power 
and  ground  pins.  The  purpose  of  this  test  is  to  check  for  power  rail  shorts  on  the  IC.  This  type 
of  defect  can  have  disastrous  consequences  and  must  be  checked  before  applying  power  to  the 
chip.  Next,  the  power  consumption  of  the  chip  was  measured  to  verify  that  the  128 
transimpedance  amplifiers  used  to  amplify  photodetector  current  are  properly  biased.  Since  the 
digital  portion  of  the  chip  uses  CMOS  logic  circuitry,  it  has  no  static  power  consumption.  On 
the  other  hand,  the  transimpedance  amplifiers  always  draw  DC  current  and  thus  have  a  non¬ 
zero  static  power  consumption.  For  this  test  the  chip  current  draw  was  measured  with  power 
pins  connected  and  with  all  input  signal  pins  tied  to  ground.  The  total  current  drawn  from  the 
supply  was  measured  at  78mA  corresponding  to  0.39W  power  consumption.  Thus  each 
transimpedance  amplifier  is  drawing  610|iA  of  current  (78mA  /  128  amplifiers).  Tins 
measurement  is  in  excellent  agreement  with  Spice  simulation  of  tie  transimpedance  amplifier 
that  predicts  a  current  draw  of  650|lA. 
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Figure  10.  This  figure  shows  the  output  of  the  ESR  circuit  test  working  at  20Mbit/sec. 

The  next  step  was  to  verify  the  correct  functionality  of  the  chip.  Since  the  128  optical 
inputs  and  64  optical  outputs  were  not  directly  accessible,  testing  must  rely  on  the  ESR  circuit 
to  shift  out  the  data  generated  by  the  64-bit  ALU.  To  test  the  ESR  circuit  itself,  a  repeating 
pattern  (1010...)  was  input  to  the  shift-in  port  of  the  ESR  and  observed  to  emerge  from  the 
shift-out  port  of  the  ESR  after  64  clock  cycles.  Figure  10  shows  the  measured  chip  output  for 
the  ESR  shift-out  signal.  This  test  was  performed  at  100MHz  with  electrical  signal  rise  and  fall 
times  below  2  nanoseconds.  This  indicates  that  the  ESR  circuit  is  capable  of  operating  at 
100MHz  clock  rates. 


Figure  1 1 .  Measured  results  from  testing  the  not(A)  ALU  function. 

It  is  possible  to  test  the  64-bit  ALU  even  though  its  inputs  are  not  directly  accessible. 
The  128  ALU  inputs  are  connected  to  the  outputs  of  128  transimpedance  amplifiers.  Without 
the  optical  devices  on  the  chip,  all  the  transimpedance  amplifiers  should  produce  a  logic  zero 
(‘0’)  output.  If  all  the  inputs  to  the  ALU  are  ‘O’,  the  operation  of  many  of  its  functions  can  still 
be  verified.  The  first  test  that  was  performed  was  to  check  the  not(A)  and  not(B)  functions  of 
the  ALU.  Inverting  a  ‘0’  64-bit  input  should  produce  a  ‘1’  64-bit  output.  The  ALU  control 
signals  were  configured  to  perform  the  not(A)  function.  Next  proper  control  signals  were 
applied  to  load  the  64-bit  ALU  output  into  the  ESR.  Next,  the  ESR  circuit  was  clocked  to  shift 
the  64-bit  result  outside  of  the  chip.  While  the  64-bit  result  was  shifted  out  from  the  left  side  of 
the  ESR,  the  right  side  of  the  ESR  is  configured  to  shift  in  zeros  (‘0’).  The  test  was  setup  up  to 
monitor  the  output  of  the  ESR  and  to  restart  the  test  upon  detecting  a  zero  output.  Figure  1 1 
shows  the  measured  ESR  output  for  this  test  where  the  output  of  the  ALU  is  ‘  1  ’  as  expected 
and  the  ‘O’  pulse  indicates  the  end  of  a  complete  test  cycle.  Using  a  similar  approach,  a  number 
of  other  ALU  functions  were  electrically  tested. 

To  test  the  high-speed  operation  of  the  ALU,  a  sequence  of  ALU  operations  was 
repeatedly  applied  to  produce  a  periodic  bit  pattern  (11010. . .)  in  the  MSB  position  of  the  64-bit 
ALU  output.  This  64-bit  output  was  then  loaded  into  the  ESR,  with  the  MSB  position 
becoming  immediately  visible  from  an  electrical  output  pin.  The  test  was  run  with  different 
clock  speeds  up  to  the  maximum  100MHZ  speed  supported  by  our  test  fixture.  Figure  12 
shows  the  measured  and  fully  functional  result  for  100MHZ  operation. 
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Figure  12.  High-speed  testing  of  64-bit  processor  chip.  The  measurement  on  the  left  side  shows  correct  operation  at 

50MHz(e.g.  11010...  output). 
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